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Abstract — Secure distributed data compression in the presence 
of an eavesdropper is explored. Two correlated sources that need 
to be reliably transmitted to a legitimate receiver are available 
at separate encoders. Noise-free, limited rate links from the 
encoders to the legitimate receiver, one of which can also be 
perfectly observed by the eavesdropper, are considered. The 
eavesdropper also has its own correlated observation. Inner 
and outer bounds on the achievable compression-equivocation 
rate region are given. Several different scenarios involving 
the side information at the transmitters as well as multiple 
receivers/eavesdroppers are also considered. 

I. Introduction 

With the emergence of wireless sensor networks and dis- 
tributed video applications, distributed source compression 
has become an important research area. A significant amount 
of effort has been devoted to understanding the information 
theoretic limits of distributed lossless and lossy compression 
and developing codes to achieve these limits. However, in 
many real-life applications involving distributed compression, 
such as distributed video surveillance or monitoring of some 
private information, secure compression and communication 
while meeting the end-to-end quality of service requirements 
becomes important. In this paper we consider the information 
theoretic limits of secure lossless source compression in the 
presence of an adversary who has access to some of the links 
in the network as well as its own correlated observation of 
the data to be compressed. We consider information theoretic 
secrecy, that is, we want to limit the information leakage to 
a computationally unbounded eavesdropper who has the full 
knowledge of the compression algorithms used. 

We first consider a simplified model of the general secure 
distributed compression problem, composed of two transmit- 
ters Alice and Charlie with correlated observations, a receiver 
Bob, and an eavesdropper Eve who is interested in the data 
of Alice. Eve eavesdrops Alice's channel to Bob, i.e., it 
knows Alice's message to Bob exactly. Eve also has her own 
correlated side information. We consider the scenario in which 
both Alice's and Charlie's data need to be reconstructed at 
Bob reliably while Eve is interested in only Alice's infor- 
mation source. Later, we consider various cases involving 
the availability of the side information at different terminals. 

This research was supported in part by the US National Science Foundation 
under Grants ANI-03-38807, CCF-04-30885, CCF-06-35177, CCF-07-28208, 
and CNS-06-25637. 
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Fig. 1 . Two terminal secure distributed compression. The eavesdropper (Eve) 
can only access one of the links. 



Finally, we analyze cases with multiple receivers or multiple 
eavesdroppers. 

In Wyner's classical wiretap channel model [1], nonzero 
secrecy rate can be achieved without using a secure key, if the 
intended receiver has a better quality communication channel 
than the eavesdropper It was observed in [3] and [4] that, 
secrecy can also be generated through correlated observations 
at the legitimate users. In our model, since the channels 
are not noisy, the techniques of [1] do not apply; however, 
based on the ideas of [3], [4], it is still possible to achieve 
secrecy by exploiting the correlated information transmitted 
over secure links. Unlike [3], [4] which focus on generating 
secret key using correlated information sources, we impose 
the requirement of lossless decoding of the source sequence 
at the legitimate receiver while keeping Alice's information 
secret from Eve. 

In [8], Yamamoto considers lossy compression with security 
constraints over a noisy broadcast channel, while the users 
share a secure key as well. He showed that first applying 
lossy source compression, then encrypting the compressed 
bits using the secure key and finally transmitting over the 
channel using a good wiretap channel code is optimal. In [5], 
Merhav extends this result to the case in which the legitimate 
receiver and the eavesdropper have correlated side information 
under the assumption that both the channel output and the 
side information at the eavesdropper are physically degraded. 
He shows that replacing lossy compression with Wyner-Ziv 
compression in the coding scheme of [8] is optimal. In [6], 
the minimum leakage rate in secure lossless compression with 
arbitrary side information is explored. It is shown in [6] that, 
in the case of arbitrarily correlated receiver side information, 
the usual Slepian-Wolf compression is not always sufficient. 



Secure lossless compression of two correlated sources is 
considered in [7], where the eavesdropper has access to only 
one of the compressed bit streams and has no side information. 
Slepian-Wolf compression suffices in this setup due to the lack 
of side information at the eavesdropper. 

We introduce the system model in Section In Sec- 
tion Hn] we give inner and outer bounds to the achievable 
compression-equivocation rate region that generalize the well- 
known Slepian-Wolf region to include secrecy constraints. In 
Section |IV] we consider various different scenarios based on 
the availability of the side information and also considering 
multiple legitimate receivers or multiple eavesdroppers. 

II. System Model 

For the model in Fig. [T] we assume that Alice and Charlie 
have access to length- correlated source sequences and 
, respectively. They want to transmit these sources to 
Bob reliably over separate noise-free, finite capacity channels. 
Alice's transmission will also be perfectly received by an 
eavesdropper called Eve who has her own correlated side 
information . We model , , and as being gener- 
ated independent and identically distributed (i.i.d.) according 
to the joint probability distribution pACE{a,c,e) over the 
finite alphabet A x C x £. While Alice and CharUe want 
to transmit their sources reUably to Bob, they also want to 
maximize the equivocation at Eve, which represents the un- 
certainty of Eve about A^ after receiving Alice's transmission 
and combining with her (Eve's) own side information E^ . 
We will also consider scenarios involving multiple legitimate 
receivers/eavesdroppers for which similar definitions apply. 
Throughout the paper we assume that all the transmissions 
are authenticated, i.e., the eavesdropper is passive. 

An (Ma, Mc, N) code for secure source compression in 
this setup is composed of an encoding functioiu at Alice, 
/a ■ A^ Ima^ ™ encoding function at Charlie, fc ■ 
Imc^ ^iid ^ decoding function at Bob, g : Ima ^ 
Imc ^ X where Ik denotes the set {1, . . . , /c} for 
k e Z+. The equivocation rate of this code is defined as 
j^H{A^\jA{A^),E^), and the error probability as = 
PT{g{fA{A^),fc{C''))^{A^,C^)}. 

Definition 2.1: We say that {Ra, Rc, ^) is achievable if, 
for any e > 0, there exist an {Ma,Mc,N) code such 
that log{MA) < N{Ra + e), log(Mc) < N{Rc + e), 
H{A'^\fA{A^),E^) > N{A-e) and Pf < e. Let TZ denote 
the set of all achievable {Ra, Rc, ^) triplets. 

III. Secure Distributed Compression 

For the model in Section HI] when we remove the secrecy 
requirements, the problem reduces to the well-known Slepian- 
Wolf coding of correlated sources. However, the solution in 
the case of distributed compression with secrecy constraints is 
not a direct extension of the Slepian-Wolf theorem. 

'We assume deterministic coding in the analysis for simplicity, but the 
proofs follow similarly for randomized coding which is modeled by assuming 
independent random variables at the terminals and deterministic coding 
functions that depend on these random variables. 



Definition 3.1: Let U and V be two random variables 
jointly distributed with A, C and E and taking values over the 
finite alphabets U and V. We define Vin as the set of {U, V) 
that satisfy H{C\A,V) — with a joint distribution of the 
form PacePu\aPv\c- We define Vout as the set of {U,V) 
that satisfy H{C\A, V) = and the Markov chain conditions 
U - A - {C,E) and V -C ~ {A, E). 

Definition 3.2: We define TZm as the convex hull of the set 
of all {Ra, Rc, ^) for which there exists {U, V) e Vin such 
that 
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where [x]^ — max{x,0}, and TZout as the convex hull of the 
set of all {Ra, Rc, ^) for which there exists {U,V) G Vout 
such that (iTJ-© hold. 

Our main result is the following inner and outer bounds on 
the set of all achievable triplets of Definition 12.11 
Theorem 3.1: TZin C 7^ C TZout- 
Proof: The sketch of the proof is given in Appendix I] 

■ 

When the eavesdropper has no side information, i.e., E is 
constant, then the inner and outer bounds become tight. The 
following corollary can be obtained similar to Theorem 13. II 

Corollary 3.2: When there is no side information at the 
eavesdropper, the compression-equivocation rate region is 
characterized by 

Rc > H{C\A), 
Ra > H{A\C), 
Ra + Rc> H{A,C), and 
[H{A) - Ra]-^ <A < min{/(A; C), Rc - H{C\A)}. 
We skip the proof, which can also be obtained as a special 
case of the result in [7]. The achievability is simply from the 
usual Slepian-Wolf compression, that is, we let U be constant. 

IV. Uncoded Side Information at Bob 



A special case of Theorem 13.11 is obtained when we have 
Rc > H{C), that is, can be recovered by Bob with 
an arbitrarily small probability of error Equivalently, we can 
assume that the side information sequence = is 
available directly to Bob. The compression-equivocation rate 
region for this special case is given as below, which can be 
obtained from Theorem 13.11 

Corollary 4.1: For uncoded side information at Bob, 
(i?A,A) is an achievable compression-equivocation rate pair 
if and only if, 

Ra > H{A\B), (6) 
0<A < [I{A;B\U) - I{A;E\U)]+, and (7) 
i?A + A > H{A\E), (8) 



for some U such that U — A — {B, E) form a Markov chain. 

While Corollary 14.11 requires an auxiliary codebook gener- 
ated by U in the general case to conceal the source from the 
eavesdropper, in [9] we show that, when Bob's side informa- 
tion B is less noisy [2] than Eve's side information E, Slepian- 
Wolf binning achieves the highest possible equivocation rate, 
i.e., O is maximized by a constant U . Furthermore, when 
Bob's side information is a stochastically degraded version 
of Eve's side information, no positive equivocation rate is 
achievable, and A = 0. 

In [9], we also show that the availability of either Bob's 
or Eve's side information at Alice potentially increases the 
equivocation rate of the eavesdropper, while the compression 
rate bound on Ra remains intact. When Alice does not have 
access to B^ , A = if B is independent of A. However, 
when B^ is available to Alice, it is useful even if they are 
independent. This scenario is equivalent to Shannon's secret 
key model, in which a secure key, independent of the message, 
at rate H{B) is shared by Alice and Bob. 

Lemma 4.2: If B is independent of (A, E) and available to 
Alice as well, then [Rai A) is achievable if and only if 

Ra > H{A) and 
0<A < rairL{H{B),H{A\E)}. 
Proof: The achievability follows by first compressing 
the source A^ and then encrypting the compressed source 
bits with the secure shared key B^ , i.e., one time pad. The 
converse can be obtained from Corollary 14.11 ■ 
Now we compare two scenarios of having Eve's side 
information at Alice or Bob. When E-^ is available to Bob, the 
side information of Eve becomes physically degraded version 
of Bob's side information, and Slepian-Wolf compression 
would suffice [9]. The compression-equivocation rate region 
is characterized by 

Ra > H{A\B,E) 
0<A < I{A;B\E) and 
Ra + A > H{A\E). 

Note that, having E^ at Bob helps decrease the compression 
rate bound on Ra as well. Furthermore, as shown in [9] 
providing E^ to Alice in addition to Bob would not help. 

On the other hand, when E^ is available to Alice, the 
compression-equivocation rate region is given by [9] Ra > 
H{A\B), < A < I{A;B\E) and Ra + A > H{A\E). 
Comparing the two regions when E^ is available to Alice or 
to Bob, we see that the latter requires a smaller compression 
rate due to a better side information at the receiver, while the 
equivocation rates are equal. 

V. Multiple Legitimate Receivers/Eavesdroppers 

Consider K legitimate receivers, each with its own corre- 
lated side information Bk for k — 1 , . . . , if, that want to 
receive Alice's information reliably, while there is only one 
eavesdropper Eve. In the absence of an eavesdropper, a rate 
of maxfc H{A\Bk) is necessary and sufficient for simultaneous 
reliable transmission to all the receivers [11]. 



From Corollary 14. II considering each receiver separately, 
the equivocation rate is bounded as A < max{_ff (^|£', Uk) — 
H{A\Bk,Uk)} where the maximization is over Uk satisfying 
the Markov chain Uk - A ~ {Bk,E) for k = 1,...,K. 
The minimum of these individual equivocation rate bounds 
serves as an upper bound; however, achievability of it together 
with Ra > maxk H{A\Bk) does not follow directly. The 
achievability proof outlined in Appendix|T]requires an auxiliary 
codeword to be decoded by the receiver However, for multiple 
receivers, the auxiliary codebook Uk that maximizes the equiv- 
ocation rate for one of the users, might not be decodable by an- 
other user Imposing such a decoding constraint requires a total 
transmission rate of max^ I{A; Uk\Bk)+niaxk H{A\Bk, Uk), 
which might be greater than max^ H{A\Bk), the required rate 
without the secrecy constraint. 

Below, we give the compression-equivocation rate region in 
case of multiple receivers for two special cases. Proofs are 
omitted due to space limitation. 

Corollary 5.1: If A — Bk — E form a Markov chain for all 
k = 1, . . . ,K, then {Ra, A) is achievable if and only if, 

Ra > maxff(A|Bfe), 

k 

A < mm{H{A\E) - H{A\Bk)), and 

k 

Ra + A > H{A\E). 
Corollary 5.2: If A — Bi — ■ ■ ■ — Bk form a Markov chain, 
then {Ra, A) is achievable if and only if, 

Ra > H{A\Bk), 
A < niayi{I{A,BK\U) - I{A;E\U)}, and 
Ra + A > H{A\E), 

where the maximization is over auxiliary random variables U 
such that U — A — {Bi, . . . , Bk, E) form a Markov chain. 

In Corollarv l5.1l due to the degradedness of E with respect 
to Bfc's, picking a constant U is optimal for all receivers. In 
Corollary 15.21 we use the auxiliary codebook U that is chosen 
with respect to the worst receiver side information Bk- 

Similarly, there may be multiple non-cooperating eavesdrop- 
pers all of which have their own correlated side information. 
Suppose there are K eavesdroppers, the fc-th of which has side 
information Ek- We have K equivocation rates defined as 
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This time, we pick the auxiliary codebook that simultaneously 
achieves the corresponding equivocation rates A^. 

Corollary 5.3: {R, A) is achievable for the multiple eaves- 
dropper scenario if and only if. 



Ra 



Ra > 
Afc < 
' Afe > 



H{A\B), 

[H{A\Ek,U) 

H{A\Ek), 



H{A\B,U)] + , and 



for k = 1, . . . ,K for some auxiliary random variable satisfy- 
ing the Markov chain condition U — A ~ {B, Ei, . . . , Ek). 



VI. Conclusion 

In this paper, we have considered secure distributed com- 
pression in the presence of an eavesdropper We have studied 
the case in which one of the transmitters is wire-tapped by an 
eavesdropper and we have shown that secure communication 
can be achieved with the help of the second transmitter who 
has its own correlated side information and a secure link to 
the legitimate receiver We have provided inner and outer 
bounds to the compression-equivocation rate region for the 
model studied. We have also considered availability of side 
information at the transmitters, multiple legitimate receivers 
or multiple eavesdroppers. Future directions include extension 
to the lossy compression scenario. 

Appendix I 
Proof of Theorem I3.1I 

Inner bound: We fix p{u\a) and p{v\c) satisfying the 
conditions in the theorem. Then we generate 2^'^^'^"^'^^+'^i^ 
independent codewords of length N, U^{wi), wi £ 
{!,..., 2^(^('4;'^)+ei)}, with distribution H^i pM- We ran- 
domly bin all U^{wi) sequences into 2^(^('4;^l^)+^2) bins, 
calling them the auxiliary bins. For each codeword (wi), 
we denote the corresponding auxiliary bin index as a{wi). 
On the other hand, we randomly bin all sequences into 
2N(H(A\v,u)+e3) bins, calling them the source bins, and denote 
the corresponding bin index as s{A^). We also generate 
2N(i(C;V)+<L4) independent codewords V^{w2) of length N, 
W2 e {1, . . . , 2^(^(C'^)+^*)}, with distribution ^^=l^•("0■ 

For each typical outcome of , AUce finds a jointly 
typical U^{wi). Then she reveals a{wi), the auxiliary bin 
index of U^{wi), and s{A^), the source bin index of A , 
to both Bob and Eve; that is, the encoding function of 
Alice is composed of the pair {a{wi), s{A^)). Using standard 
techniques, it is possible to show that we have such a unique 
index pair with high probability. Charlie observes the outcome 
of its source , finds a jointly typical {'W2) with , 
and sends the index W2 of over the private channel to Bob. 
With high probability there will be a unique W2 such that 
and {W2) are jointly typical. 

Bob, having access to W2 and the auxiliary bin index 
a{wi), can find the jointly typical (wi) correctly with high 
probability. Then using U^, the source bin index s{A^) and 
V^{w2), Bob can reliably decode A^ . Since H{C\A, V) = 0, 
knowing A^ and correctly. Bob can find the correct 
with high probability as well. Letting ^ for i = 1,2,3 
and 4, we can make the total communication rate of Alice 
arbitrarily close to I{A; U\V) + H{A\U, V) = H{A\V) and 
the rate of Charlie to /(C; V). Since ([T]!-© hold, these rates 
can be communicated to Bob while having arbitrarily small 
error probability for sufficiently large N. 

The equivocation rate can be lower bounded as follow: 

H{A^\aiwi),s{A^),E^) = H{A^) - I{A^;a{wi),E^) 
-I{A^;siA^)\E^,aiw^)) 
>H{A'^)-I{A'^;U^,E^)-H{s{A'^)) (9) 



>H{A^\U^,E^)-NH{A\V,U)-Ne3 (10) 
= N[H{A\U, E) - H{A\V, U) - eg] 
= N[I{A-V\U) ^ I{A;E\U) - e^], 

where (|9]l follows from the data processing inequality; and 
([Tol l follows from the fact that s{A^) is a random variable 
over a set of size 2^(^(^l^>^)+<^3). 
For {U,V) £ Vin, we can show that 

/(A; V\U) - /(A; E\U) < /(A; C) and 
I{A; V\U) ~ I{A; E\U) < Rc - H{C\A) 

Hence (|4]i is not active in the inner bound. 
Finally, we also have 

^H{A^\a{w,),s{A^),E^) 

= 1 [H{A^\E^) - /(A^; aK), s{A^)\E^)\ 

>H{A\E)-^H{a{w,),s{A'')) 

> H{A\E) - Ra- 

Outer bound: We define 

J^/^(A^) and ^ /c(C^). 

From Fano's inequality, we have 

H{A^ ,C^\J,K)<N5(P^), (11) 

where 5{ ) is a non-negative function with Mmx^o 5{x) = 0. 

Define = {J , A^''^ , E^-'^) and = {K,C'-^). Note 
that both Ui-Ai-{Ct, Ei) and Vi -Ci-{Ai, Ei) form Mai'kov 
chains. Then, we have the following chain of inequalities: 

NRc >H{K) 

>I{C^]K) 

N 

= J2l{a;K\C'-') (12) 

i=l 

N N 
i=l 1=1 

where (fT2] i follows from the chain rule. We also have 

NRa >H{J) > H{J\K) 

^H{A^ ,J\K) ~ H{A'^\J,K) 
>H{A^\K)-N6{P^) (14) 

N 

■>Y^H{A,\K,A'-\C'-^)-N5{P^) (15) 

i=l 
AT 

^Y,H{A,\K,C'-^)~Ne (16) 

i=l 
N 

= Y^H{A,\V.)~N5{P^), (17) 



where ( fT4b follows from (fTTI ) and the nonnegativity of entropy; 
( fTsT i follows as — {K, - C"^^ form a Markov chain; 

and (dill follows as At - (K, - A^~^ form a Markov 

chain. 

Next, we have the following set of inequalities: 

iV(5(Pf ) >E{C^\J,K) 
>H{C^\A'^,K) 

N 



^J2HiQ\A,,K,C'-') 

i=l 
N 



(18) 
(19) 

(20) 
(21) 



where ( fTSl l follows from (fTTT i; ( fT9] l follows since conditioning 
reduces entropy and J is a function of A^; and (ISTT l follows 
form the definition of Vi. 

For the equivocation rate converse, we have 



NA^ H{A^\J,E^) 

= H{A^\J) - I{A^;E^\J) 

= H{A^\J, K) + /(A^; K\J) - I{A^- E^\J) 



N 



N 



?N\ aN 



H{E'^ \A'^ , J) 



N 



<N5{P^) + Y,I{A-K\J, A'-\ E^-^) 

i=l 

N 

-Y,H{E,\J,E''\A'-^) + H{E'^\A 



(22) 



(23) 



i=l 



N 



< N5{P^) + J2[IiA^■,K,C'-^\J,A'-\E'-'^) 

i=l 

- H{E,\J,E'-\A'-') + H{E,\A,)] (24) 

N 

= J2 [^(^«; - H{E,\Ui) + H{E,\A,)] + iV5(Pi^) 

(25) 

JV 

= ^ [/(A,; V,\Ui) - I{A,- E,\Ui)] + N5{P^) (26) 



i=l 



where ( l22b follows from Fano's inequality and the chain rule; 
( |23] l follows from the memoryless property of the source and 
the side information sequences, and the fact that conditioning 
reduces entropy; ( l24l i follows from the chain rule and the 
non-negativity of the mutual information; (|25] | follows from 
definitions of Vi and Ui, and finally ( |26l ) follows since Ui — 
Ai — £'i form a Markov chain. 



(28) 



We also have 

N5{P^) > H{A^, C^l J, K) (27) 
= H{J, K\A^, C^) + H{A^, C^) - H{J, K) 

> H{A^) + H{C^) - I{A^; C^) - H{J) - H{K) 
= H{A^\J) ~ H{J\A^) + NH{C\A) - H{K) 

> H{A^\J,E^) + NH{C\A) -NRc 
where (|27] i follows from (fTTT i. We get 

5{P^)>A + H{C\A)~Rc, 

and 

N5{P^) > H{A^,C^\J,K) 

= H{A^,C^) - I{A^, C^- J, K) 
= H{A^,C^) - H{A^) + H{A'^\J) 

+ H{C''\K)+I{K;J) 
>H{A^\J,E^)-I{A^;C^) 
> NA-~ NI{A;C). 
And finally we have 



H{C 



1 



HiA\E)<-HiA'\J\E'') 



< 



1 

N 

H{J) 



[H{J\E^)+H{A^\E^,J)] 
A 



< 7V5(pf ) + J2 HA; K\J, A'-') - J2 H{E,\J, E'-') 



N 

<Ra + A. 



(29) 



Now, we define a new independent random variable Q 
uniformly distributed over the set {1, 2, ... , N}, and A — Aq, 
E = Eq,V = (Vq, Q), and U ^ {Uq, Q). Letting ^ oo 
and P^ ^ we obtain the outer bound in the theorem. 
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